1,025 research outputs found

    Unveiling User Behavior on Summit Login Nodes as a User

    Full text link
    We observe and analyze usage of the login nodes of the leadership class Summit supercomputer from the perspective of an ordinary user -- not a system administrator -- by periodically sampling user activities (job queues, running processes, etc.) for two full years (2020-2021). Our findings unveil key usage patterns that evidence misuse of the system, including gaming the policies, impairing I/O performance, and using login nodes as a sole computing resource. Our analysis highlights observed patterns for the execution of complex computations (workflows), which are key for processing large-scale applications.Comment: International Conference on Computational Science (ICCS), 202

    Towards Lightweight Data Integration using Multi-workflow Provenance and Data Observability

    Full text link
    Modern large-scale scientific discovery requires multidisciplinary collaboration across diverse computing facilities, including High Performance Computing (HPC) machines and the Edge-to-Cloud continuum. Integrated data analysis plays a crucial role in scientific discovery, especially in the current AI era, by enabling Responsible AI development, FAIR, Reproducibility, and User Steering. However, the heterogeneous nature of science poses challenges such as dealing with multiple supporting tools, cross-facility environments, and efficient HPC execution. Building on data observability, adapter system design, and provenance, we propose MIDA: an approach for lightweight runtime Multi-workflow Integrated Data Analysis. MIDA defines data observability strategies and adaptability methods for various parallel systems and machine learning tools. With observability, it intercepts the dataflows in the background without requiring instrumentation while integrating domain, provenance, and telemetry data at runtime into a unified database ready for user steering queries. We conduct experiments showing end-to-end multi-workflow analysis integrating data from Dask and MLFlow in a real distributed deep learning use case for materials science that runs on multiple environments with up to 276 GPUs in parallel. We show near-zero overhead running up to 100,000 tasks on 1,680 CPU cores on the Summit supercomputer.Comment: 10 pages, 5 figures, 2 Listings, 42 references, Paper accepted at IEEE eScience'2

    F*** workflows: when parts of FAIR are missing

    Full text link
    The FAIR principles for scientific data (Findable, Accessible, Interoperable, Reusable) are also relevant to other digital objects such as research software and scientific workflows that operate on scientific data. The FAIR principles can be applied to the data being handled by a scientific workflow as well as the processes, software, and other infrastructure which are necessary to specify and execute a workflow. The FAIR principles were designed as guidelines, rather than rules, that would allow for differences in standards for different communities and for different degrees of compliance. There are many practical considerations which impact the level of FAIR-ness that can actually be achieved, including policies, traditions, and technologies. Because of these considerations, obstacles are often encountered during the workflow lifecycle that trace directly to shortcomings in the implementation of the FAIR principles. Here, we detail some cases, without naming names, in which data and workflows were Findable but otherwise lacking in areas commonly needed and expected by modern FAIR methods, tools, and users. We describe how some of these problems, all of which were overcome successfully, have motivated us to push on systems and approaches for fully FAIR workflows.Comment: 6 pages, 0 figures, accepted to ERROR 2022 workshop (see https://error-workshop.org/ for more information), to be published in proceedings of IEEE eScience 202

    WfBench: Automated Generation of Scientific Workflow Benchmarks

    Full text link
    The prevalence of scientific workflows with high computational demands calls for their execution on various distributed computing platforms, including large-scale leadership-class high-performance computing (HPC) clusters. To handle the deployment, monitoring, and optimization of workflow executions, many workflow systems have been developed over the past decade. There is a need for workflow benchmarks that can be used to evaluate the performance of workflow systems on current and future software stacks and hardware platforms. We present a generator of realistic workflow benchmark specifications that can be translated into benchmark code to be executed with current workflow systems. Our approach generates workflow tasks with arbitrary performance characteristics (CPU, memory, and I/O usage) and with realistic task dependency structures based on those seen in production workflows. We present experimental results that show that our approach generates benchmarks that are representative of production workflows, and conduct a case study to demonstrate the use and usefulness of our generated benchmarks to evaluate the performance of workflow systems under different configuration scenarios

    Rfam: updates to the RNA families database

    Get PDF
    Rfam is a collection of RNA sequence families, represented by multiple sequence alignments and covariance models (CMs). The primary aim of Rfam is to annotate new members of known RNA families on nucleotide sequences, particularly complete genomes, using sensitive BLAST filters in combination with CMs. A minority of families with a very broad taxonomic range (e.g. tRNA and rRNA) provide the majority of the sequence annotations, whilst the majority of Rfam families (e.g. snoRNAs and miRNAs) have a limited taxonomic range and provide a limited number of annotations. Recent improvements to the website, methodologies and data used by Rfam are discussed. Rfam is freely available on the Web at http://rfam.sanger.ac.uk/and http://rfam.janelia.org/

    Are language production problems apparent in adults who no longer meet diagnostic criteria for attention-deficit/hyperactivity disorder?

    Get PDF
    In this study, we examined sentence production in a sample of adults (N = 21) who had had attention-deficit/hyperactivity disorder (ADHD) as children, but as adults no longer met DSM-IV diagnostic criteria (APA, 2000). This “remitted” group was assessed on a sentence production task. On each trial, participants saw two objects and a verb. Their task was to construct a sentence using the objects as arguments of the verb. Results showed more ungrammatical and disfluent utterances with one particular type of verb (i.e., participle). In a second set of analyses, we compared the remitted group to both control participants and a “persistent” group, who had ADHD as children and as adults. Results showed that remitters were more likely to produce ungrammatical utterances and to make repair disfluencies compared to controls, and they patterned more similarly to ADHD participants. Conclusions focus on language output in remitted ADHD, and the role of executive functions in language production

    Response to comment on 'Amphibian fungal panzootic causes catastrophic and ongoing loss of biodiversity'

    Get PDF
    Lambert et al. question our retrospective and holistic epidemiological assessment of the role of chytridiomycosis in amphibian declines. Their alternative assessment is narrow and provides an incomplete evaluation of evidence. Adopting this approach limits understanding of infectious disease impacts and hampers conservation efforts. We reaffirm that our study provides unambiguous evidence that chytridiomycosis has affected at least 501 amphibian species

    Gaze following in an asocial reptile (Eublepharis macularius)

    Get PDF
    Gaze following is the ability to utilise information from another's gaze. It is most often seen in a social context or as a reflexive response to interesting external stimuli. Social species can potentially reveal utilisable knowledge about another's future intentions by attending to the target of their gaze. However, in even more fundamental situations, being sensitive to another's gaze can also be useful such as when it can facilitate greater foraging efficiency or lead to earlier predator detection. While gaze sensitivity has been shown to be prevalent in a number of social species, little is currently known about the potential for gaze following in asocial species. The current study investigated whether an asocial reptile, the leopard gecko (Eublepharis macularius), could reliably use the visual indicators of attention to follow the gaze of a conspecific around a barrier. We operated three trial conditions and found subjects (N = 6) responded significantly more to the conspecific demonstrator looking up at a laser stimulus projected onto an occluder during the experimental condition compared to either of two control conditions. The study's findings point toward growing evidence for gaze-following ability in reptiles, who are typically categorised as asocial. Furthermore, our findings support developing comparative social cognition research showing the origins of gaze following and other cognitive behaviours that may be more widely distributed across taxonomic groups than hitherto thought
    corecore